Scalable reduction of large datasets to interesting subsets

نویسندگان

Gregory Todd Williams

Jesse Weaver

Medha Atre

James A. Hendler

چکیده

With a huge amount of RDF data available on the web, the ability to find and access relevant information is crucial. Traditional approaches to storing, querying, and inferencing fall short when faced with web-scale data. We present a system that combines the computational power of large clusters for enabling large-scale inferencing and data access with an efficient data structure for storing and querying this accessed data on a traditional personal computer or smaller embedded device. We present results of using this system to load the Billion Triples Challenge dataset, fully materialize RDFS inferences, and extract an “interesting” subset of the data using a large cluster, and further analyze the extracted data using a traditional personal computer.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Size Matters – Revealing Small Scale Structures in Large Datasets

The size of datasets generated in the medical imaging community is increasing faster than additional processing resources are made available. Even if we can surmount the hurdles in large data handling and processing, the amount of information encoded in these datasets is overwhelming. Therefore effective visualization techniques must allow a user to identify and focus on scientifically interest...

متن کامل

Identifying Information-Rich Subspace Trends in High-Dimensional Data

Identifying information-rich subsets in high-dimensional spaces and representing them as order revealing patterns (or trends) is an important and challenging research problem in many science and engineering applications. The information quotient of large-scale high-dimensional datasets is significantly reduced by the curse of dimensionality which makes the traditional clustering and association...

متن کامل

Scalable Image Annotation by Summarizing Training Samples into Labeled Prototypes

By increasing the number of images, it is essential to provide fast search methods and intelligent filtering of images. To handle images in large datasets, some relevant tags are assigned to each image to for describing its content. Automatic Image Annotation (AIA) aims to automatically assign a group of keywords to an image based on visual content of the image. AIA frameworks have two main sta...

متن کامل

Dynamic Data Citation

Being able to reliably and efficiently cite entire or subsets of data in large and dynamically growing or changing datasets constitutes a significant challenge for a range of research domains. Current approaches rely on pointers to entire data collections or on explicit copies of data. They do not scale with large quantities of data. Hence a new method is required that enables to create, refere...

متن کامل

Sample-oriented Domain Adaptation for Image Classification

Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

J. Web Sem.

دوره 8 شماره

صفحات -

تاریخ انتشار 2010

Scalable reduction of large datasets to interesting subsets

نویسندگان

چکیده

منابع مشابه

Size Matters – Revealing Small Scale Structures in Large Datasets

Identifying Information-Rich Subspace Trends in High-Dimensional Data

Scalable Image Annotation by Summarizing Training Samples into Labeled Prototypes

Dynamic Data Citation

Sample-oriented Domain Adaptation for Image Classification

عنوان ژورنال:

اشتراک گذاری